3 research outputs found

    Question-driven text summarization with extractive-abstractive frameworks

    Get PDF
    Automatic Text Summarisation (ATS) is becoming increasingly important due to the exponential growth of textual content on the Internet. The primary goal of an ATS system is to generate a condensed version of the key aspects in the input document while minimizing redundancy. ATS approaches are extractive, abstractive, or hybrid. The extractive approach selects the most important sentences in the input document(s) and then concatenates them to form the summary. The abstractive approach represents the input document(s) in an intermediate form and then constructs the summary using different sentences than the originals. The hybrid approach combines both the extractive and abstractive approaches. The query-based ATS selects the information that is most relevant to the initial search query. Question-driven ATS is a technique to produce concise and informative answers to specific questions using a document collection. In this thesis, a novel hybrid framework is proposed for question-driven ATS taking advantage of extractive and abstractive summarisation mechanisms. The framework consists of complementary modules that work together to generate an effective summary: (1) discovering appropriate non-redundant sentences as plausible answers using a multi-hop question answering system based on a Convolutional Neural Network (CNN), multi-head attention mechanism and reasoning process; and (2) a novel paraphrasing Generative Adversarial Network (GAN) model based on transformers rewrites the extracted sentences in an abstractive setup. In addition, a fusing mechanism is proposed for compressing the sentence pairs selected by a next sentence prediction model in the paraphrased summary. Extensive experiments on various datasets are performed, and the results show the model can outperform many question-driven and query-based baseline methods. The proposed model is adaptable to generate summaries for the questions in the closed domain and open domain. An online summariser demo is designed based on the proposed model for the industry use to process the technical text

    Adaptable Closed-Domain Question Answering Using Contextualized CNN-Attention Models and Question Expansion

    Get PDF
    In closed-domain Question Answering (QA), the goal is to retrieve answers to questions within a specific domain. The main challenge of closed-domain QA is to develop a model that only requires small datasets for training since large-scale corpora may not be available. One approach is a flexible QA model that can adapt to different closed domains and train on their corpora. In this paper, we present a novel versatile reading comprehension style approach for closed-domain QA (called CA-AcdQA). The approach is based on pre-trained contextualized language models, Convolutional Neural Network (CNN), and a self-attention mechanism. The model captures the relevance between the question and context sentences at different levels of granularity by exploring the dependencies between the features extracted by the CNN. Moreover, we include candidate answer identification and question expansion techniques for context reduction and rewriting ambiguous questions. The model can be tuned to different domains with a small training dataset for sentence-level QA. The approach is tested on four publicly-available closed-domain QA datasets: Tesla (person), California (region), EU-law (system), and COVID-QA (biomedical) against nine other QA approaches. Results show that the ALBERT model variant outperforms all approaches on all datasets with a significant increase in Exact Match and F1 score. Furthermore, for the Covid-19 QA in which the text is complicated and specialized, the model is improved considerably with additional biomedical training resources (an F1 increase of 15.9 over the next highest baseline)
    corecore